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Abstract. We review basic modeling approaches for failure and main- 
tenance data from repairable systems. In particular we consider im- 
perfect repair models, denned in terms of virtual age processes, and 
the trend-renewal process which extends the nonhomogeneous Poisson 
process and the renewal process. In the case where several systems of 
the same kind are observed, we show how observed covariates and un- 
observed heterogeneity can be included in the models. We also consider 
various approaches to trend testing. Modern reliability data bases usu- 
ally contain information on the type of failure, the type of maintenance 
and so forth in addition to the failure times themselves. Basing our work 
on recent literature we present a framework where the observed events 
are modeled as marked point processes, with marks labeling the types 
of events. Throughout the paper the emphasis is more on modeling 
than on statistical inference. 

Key words and phrases: Repairable system, preventive maintenance, 
nonhomogeneous Poisson process, renewal process, marked point pro- 
cess, virtual age process, trend-renewal process, heterogeneity, trend, 
competing risks. 



1. INTRODUCTION 

According to a commonly used definition of a re- 
pairable system [5], this is a system which, after 
failing to perform one or more of its functions satis- 
factorily, can be restored to fully satisfactory perfor- 
mance by a method other than replacement of the 
entire system. For the present paper and following 
recent literature on the subject, we suggest extend- 
ing this definition to include the possibility of ad- 
ditional maintenance actions which aim at servicing 
the system for better performance. We shall refer 
to this as preventive maintenance (PM), where one 
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may further distinguish between condition-based PM 
and planned PM. The former type of maintenance is 
due when the system exhibits inferior performance, 
while the latter is performed at predetermined points 
in time. In this presentation we will consider some 
aspects of condition-based PM, while the planned 
PM will be briefly touched on in the concluding re- 
marks. 

Traditionally, the literature on repairable systems 
is concerned with modeling failure times, with point 
process theory being the main tool. The most com- 
monly used models for the failure process of a re- 
pairable system are renewal processes (RP), includ- 
ing the homogeneous Poisson processes (HPP) and 
nonhomogeneous Poisson processes (NHPP). While 
such models often are sufficient for simple reliability 
studies, the need for more complex models has of 
course emerged. 

There is currently a rapidly increasing literature 
concerning modeling and analysis of recurrent events, 
with a wide range of applications, including relia- 
bility analysis of repairable systems, which is the 
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present topic. In a recent review paper, Cook and 
Lawless [14] presented several examples from medi- 
cal studies where models and methods for recurrent 
events are appropriate. The review paper by Peha 
[55] gave examples from both medical and reliabil- 
ity studies. The scope of our paper is biased toward 
reliability applications, although most of the mod- 
els considered have a wider applicability. We will, 
in particular, consider models which incorporate ef- 
fects of different kinds of repair and maintenance, 
and with the possibility of handling several failure 
causes, for example. 

In a review paper like this, it is of course impossi- 
ble to cover all models or methods which have been 
suggested in the literature. Our aim is rather to em- 
phasize some important ideas, and in this respect 
there will be a clear bias toward work in the direc- 
tion of our own interests and in work by ourselves 
and collaborators. Throughout the paper the em- 
phasis will be more on modeling than on statisti- 
cal inference. In addition we will try to give some 
historical perspectives on the theory and practice 
related to repairable systems, again not necessarily 
complete and possibly biased by our own views. 

One of the first comprehensive treatments of sta- 
tistical methods for recurrent events with reliability 
emphasis is the talk by David R. Cox, read before 
the Royal Statistical Society in London in March 
1955 and published in [17]. Cox touched a large 
number of topics, most of them motivated from the 
clothing industry. Topics of particular importance 
for reliability applications were trend testing, test- 
ing whether a failure process is a Poisson process, 
autocorrelated time gaps, doubly stochastic Poisson 
processes, heterogeneity between systems, correla- 
tions between different types of events, mean repair 
times, availability of service and so forth. Many re- 
sults from the paper are contained in the subsequent 
book by Cox and Lewis [19], which still is a very use- 
ful and much cited source on the subject. 

Another early contribution to the study of re- 
pairable systems is the heavily cited 1963 paper by 
Proschan [58], "Theoretical explanation of observed 
decreasing failure rate." This paper is particularly 
important since it led to the awareness that proper 
analysis of recurrent events is an important part 
of reliability theory. In particular it is one of the 
first treatments of heterogeneity in the theory of re- 
pairable systems. 

What seems to be the first book devoted solely to 
repairable systems reliability was published by As- 
cher and Feingold [5] in 1984. For a long time this 



was the main reference for repairable systems and 
it is still a major source. The subtitle of the book is 
Modeling, Inference, Misconceptions and Their Causes. 
The authors were complaining that reliability re- 
searchers and practitioners did not recognize the 
crucial difference between the statistical treatment 
of repairable systems and nonrepair able components. 
They demonstrated by simple examples how conclu- 
sions from data may be very wrong if times between 
failures are treated as i.i.d. if there is a trend in 
them. 

Data from repairable systems are usually given 
as ordered failure times T±, T2, ■ ■ . with data coming 
from a single system or from several systems of the 
same kind. The implicit assumption is usually that 
the system is repaired and put into new operation 
immediately after the failure. This restriction, dis- 
regarding repair times, is not serious if one is inter- 
ested in modeling and estimation of the probability 
mechanisms behind failure occurrences. It is, more- 
over, justified if the time scale is taken to be oper- 
ation time, number of cycles, number of kilometers 
run and so forth. We will impose this restriction in 
this paper, and we will therefore not cover impor- 
tant topics such as availability and unavailability of 
systems, where the standard tool is to use alternat- 
ing renewal processes with operation periods alter- 
nating with repair periods (see, e.g., [59], Chapter 
7). 

A common recipe for analysis of data from a re- 
pairable system is as follows. First, apply a test 
for trend in the interfailure times Aj = Tj — Ti—\. 
If no significant trend is found, then use a RP as 
a model, in which case the well established statis- 
tical tools for analysis of i.i.d. observations can be 
used. Otherwise, use a NHPP model, which handles 
trend through specification of an intensity function 
A(i). For example, a deteriorating system will then 
correspond to an increasing function A(i), while an 
improving system will correspond to a decreasing 
\(t). A homogeneous Poisson process, HPP(A), cor- 
responds to a constant intensity X(t) = A and is at 
the same time a renewal process with exponentially 
distributed interfailure times. 

A RP model is also called a perfect repair model, 
since the system is as good as new after a failure. On 
the other hand, a NHPP model corresponds to what 
is called minimal repairs, meaning that the system 
after repair is only as good as it was immediately 
before the failure. Lindqvist, Elvebakk and Hegg- 
land [48] represent the problem of distinguishing 
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between the two "extreme" kinds of repair as cor- 
responding to the first "dimension" of a repairable 
system description in the form of a so-called model 
cube (Figure 3). The second dimension is the ap- 
pearance of trend or no trend in interfailure times. 
This particular aspect of system behavior has tra- 
ditionally received much attention in reliability the- 
ory and is resolved by considering trend tests. Fi- 
nally, the third dimension corresponds to the exis- 
tence of unobserved heterogeneity between systems. 
This problem is of course relevant only when sev- 
eral systems of the same kind are observed. There is 
currently a large and increasing interest in the mod- 
eling of heterogeneity, usually known as frailties in 
the survival analysis literature. To some extent, het- 
erogeneity may have been much overlooked in relia- 
bility studies, but there are important exceptions in 
the literature. 

Several classes of models have in turn been sug- 
gested for cases not covered by the "extreme" mod- 
els RP and NHPP. These include the so-called im- 
perfect repair models. The idea is that after a re- 
pair the "virtual" age of the unit is not necessarily 
reduced to 0, such as for a perfect repair, nor is it 
the same as before the repair, such as for a mini- 
mal repair. Instead, the virtual age is reduced by a 
certain amount that depends on the type of repair. 
We review the basic properties of such models and 
we will see how the concept of virtual age can be 
generalized to more than one dimension. 

Another class of alternatives to NHPP and RP 
models, which includes these models, is the so-called 
trend-renewal process (TRP). This model is a gen- 
eralization of Berman's modulated gamma process 
[9] and has been extensively studied in [48]. In the 
present paper we will use TRP models and their ex- 
tensions as our basic framework to illustrate some 
main issues on modeling and statistical analysis of 
data from repairable systems. The TRP is partic- 
ularly suitable to illustrate the already mentioned 
three dimensions of repairable systems. 

Modern reliability data bases usually contain more 
information than just the failure times. For example, 
there may be information on the times of preventive 
maintenance (PM), identity of a failed component, 
type of failure, type of repair, cost of replacement 
and so forth. Thus we shall more generally assume 
that observations from repairable systems are repre- 
sented as marked point processes where the marks 
label the types of events. For example, the marks 
may be of two kinds, corresponding to the type of 



maintenance, repair or PM. We review some recent 
literature in this direction with the aim of extend- 
ing the theory of repairable systems to a competing 
risks setting. 

In addition to information on types of events, the 
data bases may contain covariates that represent en- 
vironmental conditions, measures of various forms of 
load and usage stress, and so forth. Such covariates 
could be constant or are possibly varying with time. 
Regression models for repairable systems are useful 
for obtaining better understanding of the underly- 
ing failure and PM mechanisms, or for predicting 
the behavior of new items. 

The outline of the paper is as follows. The basic 
notation and definitions used are given in Section 2, 
including the introduction of the marked point pro- 
cess setup. Section 3 reviews models for the case of 
failure data with a single type of events, with em- 
phasis on virtual age models and trend-renewal pro- 
cesses. Section 4 is devoted to a discussion of unob- 
served heterogeneity in repairable systems data. The 
model cube for heterogeneous trend-renewal processes 
is considered in particular. In Section 5 we consider 
various approaches to trend testing, both for data 
coming from single systems and from several sim- 
ilar systems. The possible extension of virtual age 
models to the marked process case is considered in 
Section 6. This section is based on some recent pa- 
pers on the subject. Some concluding remarks are 
given in Section 7, in particular concerning topics 
not covered in the main text. 

2. NOTATION AND BASIC DEFINITIONS 

We consider a repairable system where time usu- 
ally runs from t = and where events occur at or- 
dered times T\ , T<i , Here time is not necessarily 

calendar time, but can in principle be any suitable 
measurement which is nondecreasing with calendar 
time, such as operation time, number of cycles, num- 
ber of kilometers run, length of a crack and so forth. 
As already mentioned in the Introduction, we shall 
disregard time durations of repair and maintenance, 
so we assume that the system is always restarted 
immediately after failure or a maintenance action. 
Types of events (type of maintenance, type of fail- 
ure, etc.) are, when applicable, recorded as J±, J2, ■ . ■ 
with Jj £ J for some mark space J which will de- 
pend on the current application. For simplicity we 
will here always assume that J is a finite set. The 
observable process (Ti, Ji), (T2, J2), • • • will be called 
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Fig. 1. Event times (Tj), event types (J;) and sojourn times (Xi) of a maintained system. 



the marked event process or occasionally the failure 
process. The interevent, or interfailure, times will be 
denoted Xi, A2, — Here Xi = Ti — Ti_\,i = 1,2, ... , 
where for convenience we define Tq = 0. Figure 1 
illustrates the notation. We also make use of the 
counting process representation Nj(t) equal to the 
number of events of type j in (0,t], which counts 
the number of events of type j € J , and N(t) = 
J2jej Nj(t), which counts the number of events ir- 
respective of their types. 

To describe probability models for repairable sys- 
tems we use some notation from the theory of point 
processes. A key reference is Andersen, Borgan, Gill 
and Keiding [4]. Let Tt- denote the history of the 
marked event process up to, but not including, time 
t. In models without covariates we assume that Tt— 
includes all information on event times and event 
types before time t. Formally, Tt- is generated by 
the set {Nj(s):0< s<t,j G J}. 

Suppose then that a possibly time-dependent co- 
variate vector Z(t) is observed for the system. In this 
case the covariate history {Z(s):0<s<i} should 
be added to the history Tt- for each t > 0. This will 
imply that just before any time t we have the com- 
plete information on the previous events, as well as 
the complete covariate history including the value 
of the covariate at time t. In the case of a time- 
constant covariate vector Z, the information in Z is 
added to each history Tt- ■ 

The conditional intensity of the process with re- 
spect to events of type j G J is now defined as 

7i(*) 

_ Pr(event of type j in [t,t + At)\T t -) 

which we call the type-specific intensity for j. Thus, 
jj(t)At is approximately the probability of an event 
of type j in the time interval [t, t + At) given the his- 
tory before time t. Further, we let j(t) = J2j<=j 7j(*) 
so that j(t)At is approximately the conditional prob- 
ability of an event of any type in the time interval 
[t,t + At), where it has been tacitly assumed that 



the probability of more than one event in an inter- 
val [t,t + At) is o(At). Note that the Tj(-) and hence 
the 7(-) may be functions of the covariate vector Z(-) 
when appropriate. In typical applications, 7j(i) may 
depend on the covariate history only through the 
value Z(i) at time t. Further, it is common to assume 
that Jj(t) = jj(t)g(Z(t)), with 7?(i) depending only 
on the pure event history {Nj(s) : < s < t, j G J}, 
and with g{-) being some parametric function of 
the covariate vector such as the exponential one, 
g(z) =exp(/3'z), where f3 is a parameter vector. 

For statistical inference we need an expression for 
the likelihood function. Suppose that a single system 
with a marked event process as described above is 
observed from time to time r, resulting in observa- 
tions (Ti, Ji), (T 2 , J 2 ), . . • , (2jv( t ), Jjv(t))> in addition 
to the covariate vector Z(s) for < s < t if appli- 
cable. The likelihood function is then given by ([4], 
Section II. 7) 

(2) L=|n7 Ji (r i )|exp|-^ r 7(n)^|. 

A rough verification of (2) can be given as fol- 
lows. First, partition the interval (0,r] into s equal 
pieces, each of length h = t/s. Assume that s is 
so large that at most one event can happen in an 
interval of length h. Then the conditional proba- 
bility of an event of type j in the interval [(k — 
l)h, kh), k = 1, . . . , s, given the history before (k — 
l)h, is roughly r yj(kh)h, while the conditional prob- 
ability of no event in this interval is roughly 1 — 
j(kh)h. The probability of a realization of the pro- 
cess from to r will therefore include a product 
of N(t) terms of the type ^j{kh)h, corresponding 
to the observed events, and which in the limit as 
h — > (after dividing by the normalization h N ^ ) 
gives the product term on the right-hand side of 
(2). The exponential part of (2) comes from taking 
the limit of the product of the terms 1 — "f(kh)h ~ 
exp{— j(t) dt} for the intervals that contain 

no event, assuming continuity of 7(-). 

The likelihood function (2) is valid under the as- 
sumption that r is a stopping time, which means 
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that its value depends stochastically only on the 
past history. This property holds for the standard 
censoring schemes used in practice and in particular 
when r is independent of the event process. There 
is, however, an increasing awareness of the need to 
allow for dependent censoring in many applications 
(see, e.g., [33]). 

In typical applications, data will be available for 
several similar systems, with stopping times r usu- 
ally varying from system to system. Under the as- 
sumption of stochastic independence and identical 
probability mechanisms for the systems, the total 
likelihood will be the product of expressions (2) com- 
puted for all systems. For both parametric and non- 
parametric models of this kind there is a well de- 
veloped theory for estimation based on the martin- 
gale approach to point processes ([4] gives a com- 
prehensive account). Relevant references for statis- 
tical inference in reliability models are, among oth- 
ers, Ascher and Feingold [5], Rausand and H0yland 
[59], Crowder, Kimber, Smith and Sweeting [21] and 
Meeker and Escobar [52]. 

3. MODELS FOR REPAIRABLE SYSTEMS 
WITH A SINGLE TYPE OF EVENT 

In the present section we assume that the observa- 
tions are just the failure times T\,T2, Thus the 

mark space J will be ignored. 

A large number of models can be obtained in terms 
of a given hazard function z(t), which we think of as 
being the hazard function of the time to first failure 
of a new system. The corresponding density and cu- 
mulative distribution function are denoted, respec- 
tively, /(*) and F(t), so z(t) = /(t)/(l - F(t)). The 
idea is to use the function z(t) together with a spec- 
ification of the repair strategy to define the condi- 
tional intensity function ^{t) of the failure process. 
Models of this type are considered in Sections 3.1 
and 3.2. The corresponding models may be extended 
to the case with observed covariates, although this 
will not be made explicit. As described in Section 2, 
the conditional intensities of the form 7(t) as consid- 
ered below may be multiplied with a factor g(Z(t)) 
that defines the dependence of the covariate value 
at time t. 

3.1 Perfect and Minimal Repair Models 

Suppose first that after each failure, the system 
is repaired to a condition as good as new. In this 



case the failure process is modeled by a renewal pro- 
cess with interevent time distribution F, denoted 
RP(.F). Clearly 

l( t ) = z(t-T N{t _ ) ), 

where t — 2jy(t-) is the time since the last failure 
strictly before time t. 

Suppose instead that after a failure, the system 
is repaired only to the state it had immediately be- 
fore the failure, called a minimal repair. This means 
that the conditional intensity of the failure process 
immediately after the failure is the same as it was 
immediately before the failure, and hence is exactly 
as it would be if no failure had ever occurred. Thus 
we must have 

7(f) =*(*)> 

so that the process is a NHPP with intensity z(t), 
denoted NHPP(z(-)). In practice a minimal repair 
usually corresponds to repairing or replacing only a 
minor part of the system. 

3.2 Imperfect Repair Models and the Virtual 
Age of a System 

A classical model, suggested by Brown and Proschan 
[13], assumes that at the time of each failure a per- 
fect repair occurs with probability p and a minimal 
repair occurs with probability 1—p, independently 
of the previous failure history. This model is a sim- 
ple example of what has been called an imperfect 
repair model, and was later generalized in several 
directions. 

Kijima [34] suggested two imperfect repair mod- 
els, both involving what is called the virtual age (or 
effective age) of the system. The idea is to distin- 
guish between the system's age, which is the time 
elapsed since the system was new, usually at time 
t = 0, and the virtual age of the system, which de- 
scribes its present condition when compared to a 
new system. The virtual age is redefined at failures 
according to the type of repair performed and it runs 
along with the true time between repairs. More pre- 
cisely, a system with virtual age v > is assumed to 
behave exactly like a new system which has reached 
age v without having failed. The hazard rate of a 
system with virtual age v is thus z v (t) = z(v + 1) for 
t > 0, where z(-) is the hazard rate of the time to 
first failure of the system. 

It should be clear at this stage that models based 
on virtual ages make sense only if the underlying 
hazard functions z(-) are nonconstant. In fact, if z(-) 
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is constant, then a reduction of virtual age would not 
influence the rate of failures. 

A variety of imperfect repair models can be ob- 
tained by specifying properties of the virtual age 
process in addition to the hazard function z(t) of 
a new system. For this, suppose v(i) is the virtual 
age of the system immediately after the ith event, 
% = 1, 2 The virtual age at time t > is then de- 
fined by A(t) = v(N(t-)) + t - Tjv(t-) , which is the 
sum of the virtual age after the last event before t 
and the time elapsed since the last event. The pro- 
cess A(t), called the virtual age process by Last and 
Szekli [40], thus increases linearly between events 
and may jump only at events. It follows that 

(3) y(t) = z v ( N (t-))(t-T N( t_)) = z(A(t)), 

assuming that A(t) is included in Tt- for all t. This 
means in turn that v(i) is contained in J-^ for each 
t so that v(i) depends on the history up to and in- 
cluding Ti. The likelihood then becomes 

fN(r) ^ 
L=lY[z(v(i-l)+Xi)\ 

f f Xi 
•exp|— J z(v(i — 1) + u) du 

j-r-T N(T) "I 

-J z{v(N(T))+u)du\. 

This can be recognized as being the same as 
rN{ T ) >j 

| II fv(i-l)( X i) ji 1 - F v(N(t))( t - T N(t))}, 

where f v (t) = f(v + t)/(l- F(v)) and F v (t) = (F(v + 
t) — F(v))/(1 — F(v)) are, respectively, the density 
and the cumulative distribution function of time to 
next failure for a system with virtual age v and 
hence with hazard rate z v {-). 

It is clear that the perfect repair and minimal re- 
pair models are the special cases where, respectively, 

v{i) = and v(i) =Ti, i = 1, 2, In Kijima's [34] 

model I, the virtual age v(i) equals J2k=i^kXk, 
where D\ , Di , ... is a sequence of random variables 
on the interval [0, 1] such that is independent of 
TT k - for each k. Note that Tt h - includes Di,D2, 
. . . , -Dfc_i so that in particular the are indepen- 
dent. In Kijima's model II the virtual age v(i) is set 
t° J2k=i(U)=k Dj)Xk with the same conditions for 
the -Dfc. This means that the virtual age after the 
ith failure equals Di multiplied by the virtual age of 



the system just prior to the ith failure. The model 
of Brown and Proschan [13] is obtained when Di is 
1 with probability 1—p and with probability p for 
all i. 

Dorado, Hollander and Sethuraman [22] studied 
nonparametric statistical inference in a model slightly 
more general than Kijima's models described above. 
Nonparametric statistical inference in the Brown- 
Proschan model was first studied by Whitaker and 
Samaniego [63] and later by Hollander, Presnell and 
Sethuraman [31]. 

Recall that for the above models, the Di need 
to be observed for likelihood inference using (2) to 
be valid. This means in effect that the type of re- 
pair (minimal or perfect) must be reported for each 
repair action. In real applications, however, exact 
information on the type of repair is rarely avail- 
able. The estimation problem in the case of unob- 
served Di has been considered by, for example, Lim 
[45] (suggesting an EM algorithm approach) and 
Langseth and Lindqvist [38, 39]. 

Doyen and Gaudoin [23] studied classes of virtual 
age models based on deterministic reduction of vir- 
tual age due to repairs, and hence not requiring the 
observation of repair characteristics. The basic mod- 
els of this type can be obtained simply by letting the 
Di in Kijima's models above be replaced by para- 
metric functions. A simple example of [23] is to use 
1 — p for Di, where < p < 1 is a so-called age re- 
duction factor. 

There is a large literature on reliability modeling 
using the virtual age process. For a review we refer 
to Pham and Wang [57] and Lindqvist [46]. Section 
6 presents an attempt to define a multivariate vir- 
tual age process and corresponding repairable sys- 
tem models with several types of events. 

3.3 Generalized Linear Model Types 

Berman and Turner [10] considered estimation in 
parametric models with the conditional intensity be- 
ing of the generalized linear model type 

(4) 7(0=9{X>^)}, 

where g is a known monotonic continuous function, 
the Zi(t) are known functions of t and the history 
Tt- , and the are unknown parameters. Note that 
the functions Zi(t) may be functions of the covari- 
ates if available. One aim of the paper was to demon- 
strate how to use software for generalized linear mod- 
els to analyze repairable systems data. The model 
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(4) is closely related to the modulated renewal pro- 
cess introduced in [18] for which Cox suggested a 
semiparametric approach for inference using a par- 
tial likelihood. 

The special case of (4) obtained when g(y) = e y 
was applied to repairable systems by Lawless and 
Thiagarajah [43]. In particular, they considered the 
model 

(5) j(t) = e A)+/ 3 i9i(*)+/3 2 92(t-T JV( t_ ) ) ) 

where g\ and gi are known functions. This con- 
ditional intensity is a function of both the calen- 
dar time and the time since last failure. Note that 
Pi = gives a RP and /3 2 = gives a NHPP, while 
p 1= (3 2 = gives a HPP. 

3.4 The Trend-Renewal Process 

A class of processes called inhomogeneous gamma 
processes was suggested by Berman [9] . Berman mo- 
tivated the inhomogeneous gamma process by first 
considering the process Xx , T 2 , . . . obtained by ob- 
serving every Kth event of a NHPP, where k is a 
positive integer. He then showed how to generalize 
to the case when k is any positive number. 

We present now a generalization of Berman's idea, 
called the trend-renewal process, which was exten- 
sively studied by Lindqvist, Elvebakk and Heggland 
[48]. We will use this process in particular to de- 
scribe the three dimensions related to the properties 
of repairable systems. 

The idea behind the trend-renewal process is to 
generalize the following well-known property of the 
NHPP. First let the cumulative intensity function 
that corresponds to an intensity A(-) be defined by 
A(t) = J* \(u) du. Then if T u T 2 , . . . is a NHPP(A(-)), 
the time-transformed stochastic process A(Ti), 
A(T 2 ),... is HPP(l). 

The trend-renewal process (TRP) is defined sim- 
ply by allowing the above HPP(l) to be any renewal 
process RP(-F). Thus, in addition to the intensity 



function A(i), for a TRP we need to specify a dis- 
tribution function F of the interarrival times of this 
renewal process. Formally we can define the process 
TRP(F,A(-)) as follows: 

Let X(t) be a nonnegative function defined for 
t > 0, and let A(t) = /q X(u) du. The process T\, T 2 , . . . 
is called TRP(i ? , A(-)) if the transformed process 
A(Ti),A(T 2 ),... is RP(F), that is, if the A(T;) - 
A(Tj_i), i =1,2,..., are i.i.d. with distribution func- 
tion F. The function A(-) is called the trend func- 
tion, while F is called the renewal distribution. To 
have uniqueness of the model, it is usually assumed 
that F has expected value 1. 

Figure 2 illustrates the definition. For a NHPP(A(-)), 
the RP(F) would be HPP(l). Thus TRP(1 - e~ x , 
A(-)) = NHPP(A(-)). Also, TRP(F, 1) = RP(F), which 
shows that the TRP class includes both the RP and 
NHPP classes. 

As a motivation for the TRP model, suppose that 
failures of a particular system correspond to replace- 
ment of a major part, for example, the engine of a 
tractor (as in the data given by Barlow and Davis 
[6]), while the rest of the system is not maintained. 
Then if the rest of the system is not subjected to 
wear, a renewal process would be a plausible model 
for the observed failure process. In the presence of 
wear, on the other hand, an increased replacement 
frequency is to be expected. This is achieved in a 
TRP model by accelerating the internal time of the 
renewal process according to a time transformation 
A(t) = J^X^du which represents the cumulative 
wear. The TRP model thus has some similarities to 
accelerated failure time models. 

It can be shown [48] that the conditional intensity 
function for the TRP(F,A(-)) is 

(6) 1 (t) = z(A(t)-A(T N[t _ ) ))X(t), 

where z(-) is the hazard rate that corresponds to F. 
This is a product of one factor, X(t), which depends 
on the age t of the system and one factor which de- 
pends on a transformed time from the last previous 



I) Ti T-2 7a * 

I — —f— \ \ — - TRP(F,A(-)) 



Fig. 2. 
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failure. However, time since last failure is measured 
on a scale that depends on the cumulative intensity 
of failures. This shows that the TRP class does not 
contain, nor is contained in, the classes of processes 
considered in the previous subsection. 

Suppose now that a single system has been ob- 
served in [0, t] , with failures at T\ , T2 , ■ ■ ■ , T N r T \ . If a 
TRP(F, A(-)) is used as a model, then substitution 
of (6) into (2) gives the likelihood 

(N(t) } 




1 1=1 



eXP l X ~ A ( T N(u-))]K u ) du j- 

Equivalently, if / is the density function that corre- 
sponds to F, we can write this likelihood as 

,N(t) ^ 
L=lY[f[A(T l )-A(T l ^ 1 )]X(T l )\ 

(8) • {1 -F[A(r) - A(Tjv( t ))]}. 

The latter form of the likelihood follows directly 
from the definition of the TRP, since the condi- 
tional density of Tj given T\ = ti, . . . , = t.;_i is 
f[A(ti) — A(tj_i)]A(ij), and the probability of no fail- 
ures in the time interval (T N ^ , r] , given T\ , . . . , T N ^ , 
isl-F[A(r)-A(T^ (T) )]. 

A possible extension of the TRP to include co- 
variates would be to multiply the trend function 
X(t) by a factor g(Z(t)), for example, of the form 
exp(/3'Z(t)) as suggested in Section 2. The X(t) would 
then play the role of a baseline trend function. This 
definition generalizes in a natural way the commonly 
used NHPP model with covariates; see, for example, 
[41]. 

4. UNOBSERVED HETEROGENEITY IN 
REPAIRABLE SYSTEMS 

Analyses of reliability data often lead to an appar- 
ent decreasing failure rate which could be counter- 
intuitive in view of wear and aging effects. Proschan 
[58] pointed out that such observed decreasing rates 
could be caused by unobserved heterogeneity. 
Proschan presented failure data from 17 air condi- 
tioner systems on Boeing 720 airplanes. Applying 
Mann's [51] nonparametric trend test to each sys- 
tem and then combining to a global test statistic, 
he argued that there is no significant trend in the 



failure times for each separate plane. He then con- 
cluded by a similar test that "it seems safe to accept 
the exponential distribution as describing the fail- 
ure interval, although to each plane may correspond 
a different failure rate." He demonstrated this last 
fact statistically by using a result from Barlow, Mar- 
shall and Proschan [7] which implies that a mixture 
of exponential distributions has a decreasing failure 
rate. More precisely, he applied again the Mann test, 
which is sensitive to a decreasing failure rate, on the 
pooled interfailure times from all the planes. In this 
way he obtained a p-value of 0.007 for the null hy- 
pothesis of identical exponential distributions of the 
interfailure times. 

Heterogeneity in connection with Poisson processes 
was in fact studied as early as 1920 by Greenwood 
and Yule [27], who used a compound Poisson dis- 
tribution. Later, Maguire, Pearson and Wynn [50], 
studying occurrences of industrial accidents, showed 
how Laplace transforms enter general expressions 
for resulting distributions of intervals and counts. 
Cox [17] considered the possibility of heterogeneity, 
which he called variance components, between ho- 
mogeneous Poisson processes and listed several rea- 
sons for the interest in such models for repairable 
systems data. 

It has similarly long been known in biostatistics 
that neglecting individual heterogeneity may lead to 
severe bias in estimates of lifetime distributions. The 
idea is that individuals or components have differ- 
ent "frailties" and that those who are most "frail" 
will die or fail earlier than the others. This in turn 
leads to a decreasing population hazard, which has 
often been misinterpreted. Important references on 
heterogeneity in the biostatistics literature are [62], 
[32] and [2]. It should be noted that heterogeneity 
is, in general, unidentifiable if it is considered as 
an individual quantity. For identifiability it is neces- 
sary that frailty be common to several individuals, 
for example, in family studies in biostatistics, or if 
several events are observed for each individual, such 
as for the repairable systems considered in this pa- 
per and more generally for recurrent events data. 
The presence of heterogeneity is often apparent for 
data from repairable systems if there is a large vari- 
ation in the number of events per system. However, 
it is not really possible to distinguish between het- 
erogeneity and dependence of the intensity on past 
events for a single process. It is a fact, though, that 
ignorance of an existing heterogeneity may lead to 
suboptimal or even wrong decisions. 
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4.1 Modeling Heterogeneity for Repairable 
Systems 

The common way to model heterogeneity is to in- 
clude an unobservable multiplicative constant in the 
conditional intensity of the process; see, for exam- 
ple, [62]. For systems with a single type of event this 
is done by first replacing the conditional intensities 
7(t) in (1) by a^(t), where a is a random variable 
that represents the frailty of the system and such 
that a is included in T t - for each t. Note that 7(t) as 
described in Section 2 may well be a function of co- 
variates. Now a can be viewed as being the effect of 
an unobserved covariate. Systems with a large value 
of a will have a larger failure proneness than sys- 
tems with a low value of a. Intuitively, the variation 
in the a between systems implies that the variation 
in observed number of failures among the systems 
is larger than would be expected if the failure pro- 
cesses were identically distributed. Now, since a is 
unobservable, one needs to take the expectation of 
the likelihood that results from (2) with respect to 
the distribution of a in order to have a likelihood 
function for the observed data. 

In the marked point process formulation of Sec- 
tion 2 we may more generally assume that there are 
different frailty variables for each event type j € J . 
More precisely, we assume that there is a random 
vector a = (aj,j G J) such that the type-specific in- 
tensities for given a are aj^j{t), respectively, where 
7j(i) corresponds to the type-specific conditional in- 
tensity defined in Section 2. The resulting likelihood 
including heterogeneity is thus 



L = E H 



(9) 



/N{t) \ 

n a Ji7Jii T i) 

\ i=l / 



exp< -J2 a J lj{u)du 



where the expected value is taken with respect to 
the joint distribution of a. Multivariate frailty dis- 
tributions are considered by, for example, Hougaard 
[32] and Aalen [1]. 

In the case of several independent systems, it is 
assumed that the a's that correspond to the sys- 
tems are i.i.d. from the given joint distribution. The 
total likelihood is then the product of factors (9), 
one for each system. Note that for identifiability it 
may be necessary to introduce a normalization of a, 
for example, to assume that £7(||a||) = 1. This is be- 
cause otherwise a scale factor may be moved from 



aj to 7,(-) or vice versa without changing the value 
of (9). Alternatively one may let the aj act as free 
random scale parameters in the model if the Jj(-) 
themselves do not include scale parameters. 

For the special case of a single type of event one 
obtains simplification of the likelihood function in (9), 



L = E n 



(10) 



N(r) 



,N(t) 



n 7(2*: 



i=l 



exp-j —a J "f(u) du 



where the expectation is with respect to the dis- 
tribution of the random variable a and where for 
normalization one will usually assume E(a) = 1. 

Expression (10) suggests that a gamma distribu- 
tion for a is mathematically convenient, since a closed 
form expression of the likelihood is obtained. More 
generally, for the version (9), a multivariate gamma 
distribution for a leads to a simplified expression 
(see, e.g., [1] and [32] regarding multivariate gamma 
distributions). 

Consider now the likelihood (10) and suppose that 
a is gamma distributed with E(a) = 1, Var(a) = 5. 
Then a straightforward computation gives 



L 




F(N(t) + 1/5) 



5 l / & T{l/5) [1 /5 + J7 7 (n) du\ N ^)+ 1 / s 




[5(N(t) - 1) + 1][5(N(t) - 2) + 1] ■ ■ ■ 1 
[<5j r 7(-u)du + l]^M+ 1 /5 

where we have used the fact that T(r + 1) = rT(r). 
Recall that j(Ti) may well include covariates. This 
likelihood expression is applicable, for example, to- 
gether with the virtual age model (3) and the gen- 
eralized linear model types (4) and (5). It is also the 
likelihood function for NHPPs with heterogeneity 
and possibly covariates, as studied in Lawless [41], 
and results in the likelihood of the so-called com- 
pound power law model studied by Engelhardt and 
Bain [26]. 

We remark that (11) converges to (2) (assuming 
a single type of event) as 5 — ► 0. 
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4.2 Heterogeneity in the TRP Model, the 
HTRP Model 

Lindqvist, Elvebakk and Heggland [48] introduced 
heterogeneity into the TRP model by including an 
unobservable random multiplicative constant a in 
the trend function X(t), thus considering the condi- 
tional model TRP(i ? , aA(-)) with a renewal distribu- 
tion F that does not depend on a. This definition 
is consistent with the regression version of TRP as 
suggested at the end of Section 3.4. Now the a re- 
places the function g(Z(t)) used there. Note that in 
practice one may want to include both the frailty a 
and a covariate factor g(Z(t)). To simplify the dis- 
cussion, we will, however, not consider covariates in 
our presentation. 

Considering (6), it is seen that the conditional in- 
tensity function given a is no longer of the simple 
multiplicative form a^y{t) which was assumed in the 
previous subsection. This is because the A(-) in (6) is 
also multiplied by o. Instead of the expression (10), 
the relevant likelihood from one system becomes, 
using (7), 



L = E„ 



rrN(r) 
A i=l 



(12) • exp<j -a J z[a(A(u) - A(T N{u ^))} 



X(u) du 



or, using (8), 



(13) 



L = E a l [] /[a(A(T i )-A(T i _i))]oA(T i ; 
I i=i 

■{l-F[a(A(r)-A(T N(T) ))]}. 



Here / and z are, respectively, as before, the density 
and hazard function of the distribution F. 

The expressions (12) and (13) appear to be less 
tractable than the expression (10). Lindqvist, Elve- 
bakk and Heggland [48] obtained, however, a rather 
simple expression for the likelihood in the case of 
an inhomogeneous gamma process with gamma dis- 
tributed heterogeneity factor a, under the further 
assumption that the stopping times r coincide with 
failure times. In this case the last factor of (13) dis- 
appears, and letting F be the gamma distribution 
with unit expectation and variance 7, while a is 



gamma distributed with unit expectation and vari- 
ance 8, one obtains 



■N(t) 



L=|n( A (^)- A (^i)) 1/7 - 1 A(T l )| 
•(r(iV(T)/7 + l/«5)) 

• T(l/S)[l/8 + (I/t)^))]^^}- 1 . 

1 this is of the same form as in 



Note that for 7 
(11). 

We use the notation HTRP(F, X(-),H) for the model 
with likelihood (12) or, equivalently, (13). The "H" 
in HTRP here stands for heterogeneity, and the H 
which is added to (F, A(-)) in the notation is the dis- 
tribution of the variable a, which can be any positive 
distribution with expected value 1. 

4.3 The Three Dimensions of a Repairable 
System Description: The Model Cube and 
the Log-Likelihood Cube 

A useful feature of the HTRP model is that several 
models for repairable systems can be represented as 




Fig. 3. The model cube illustrating the HTRP(F, A(-), H ) 
and the submodels obtained by restricting one or more of 
F,X(-),H to their basic versions, respectively, F being stan- 
dard exponential (using the notation - in the figure), X(t) = X 
being constant in time and H being the distribution determin- 
istic at 1 (- in the figure). 
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submodels. With the notation HPP, NHPP, RP and 
TRP used as before, and with an H in front mean- 
ing the model which includes heterogeneity, Figure 3 
shows how the HTRP and the seven sub-models can 
be represented in a cube [25]. Each vertex of the 
cube represents a model, and the lines that connect 
them correspond to changing one of the three "coor- 
dinates" (F,X(-),H) in the HTRP notation. Going 
to the right corresponds to introducing a time trend, 
going upward corresponds to entering a non-Poisson 
(renewal) case and going backward (inward) corre- 
sponds to introducing heterogeneity. 

In analyzing data by parametric HTRP models we 
may use the cube to facilitate the presentation of 
maximum log-likelihood values and parameter esti- 
mates for the different models in a convenient, visual 
manner which may guide model choice (see [48]). 
Figures 4 and 5 show maximum likelihood values 
computed from the data of Proschan [58] and Aalen 
and Husebye [3], respectively. The latter data set is 
taken from a medical study and is included here to 
demonstrate results for data which are clearly non- 
Poisson distributed. 
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Fig. 4. The log-likelihood cube for the data of Proschan [58] 
concerning failures of air conditioner systems on airplanes, 
fitted with a parametric HTRP(_F, A(-), H) model and its sub- 
models. Here F is a Weibull distribution with expected value 
1 and shape parameter s, X(t) = cbt 11 ^ 1 is a power function 
of t and H is a gamma distribution with expected value 1 
and variance v. The maximum value of the log-likelihood is 
denoted I. 
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Fig. 5. The log-likelihood cube for the data of Aalen and 
Husebye [3] concerning migratory motor complex periods, fit- 
ted with a parametric HTRP(i ? , A(-), H) model and its sub- 
models. Here F is a Weibull distribution with expected value 
1 and shape parameter s, \(t) = cfct 6-1 is a power function 
of t and H is a gamma distribution with expected value 1 
and variance v. The maximum value of the log-likelihood is 
denoted I. 

For the Proschan data we conclude that the re- 
newal distribution can be taken to be exponential, 
leaving us with the bottom face of the cube. Fur- 
ther, when comparing the front face to the back face 
there is clear reason to conclude that there is het- 
erogeneity between the systems, with Var(o) being 
estimated to approximately 0.11. The conclusions 
so far are thus in accordance with the conclusions of 
Proschan [58]. However, a comparison of the left and 
right faces of the cube reveals a slight time trend. In 
fact, twice the log-likelihood difference from HHPP 
to HNHPP amounts to 5.28, giving a p-value of 
0.022 assuming a chi-squared distribution with one 
degree of freedom of the corresponding likelihood 
ratio test statistic. The power parameter b of the 
trend function is, furthermore, estimated as 1.16. 

The most obvious conclusion for the Aalen and 
Husebye [3] data is that the renewal distribution 
is not exponential, implying that the upper face 
of the cube applies. Further, the differences in log- 
likelihood obtained by introducing heterogeneity are 
seen to be small enough to conclude there is no sig- 
nificant heterogeneity. However, as for the Proschan 
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Fig. 6. Plot of cumulative number of failures, N(t), for air conditioner failures of plane 7913 in the Proschan [58] data. 



data, there seems to be a slight time trend. Here, 
twice the log-likelihood difference from RP to TRP 
amounts to 4.18, giving a p- value of 0.041, while the 
power parameter b is estimated as 1.14 for the TRP 
model. Note the large difference in log-likelihood 
value between, for example, the TRP and NHPP 
models. As shown by the parameter estimates (Fig- 
ure 5), the NHPP estimates seem to compensate for 
the large estimated shape parameter for the renewal 
distribution of the TRP by increasing the power pa- 
rameter b of the trend function (from 1.14 to 1.45). 
It is also seen that for the Poisson models (bottom 
face) there is no gain in log-likelihood by introduc- 
ing heterogeneity. Thus the maximum likelihood es- 
timates of the heterogeneity variance v are given by 
the border value 0. This is so since the profile likeli- 
hood of v can be shown to be a decreasing function 
of v > near (see [48] for a further discussion of 
this effect). 

5. TREND TESTING 

In many applications involving repairable systems, 
the main aim is to detect trends in the pattern of 
failures that occur over time. These may often be re- 
vealed as monotonic trends in the interfailure times, 
corresponding to either improving or deteriorating 
systems. Various types of nonmonotonic trends may 
also be present, for example, a cyclic trend or a bath- 
tub shaped trend. 



5.1 Graphical methods 

A simple but informative way to check for a possi- 
ble trend in the pattern of failures is to study plots 
like Figure 6, which is a plot of cumulative failure 
number versus failure time for a single system. The 
underlying data are failures of the air conditioner 
system of airplane 7913 of the Proschan [58] data. 
A convex plot would be indicative of a deteriorat- 
ing system, while a concave plot would indicate an 
improving system. In Figure 6 there seems to be no 
significant deviation from a straight line, however, 
thus indicating no trend in interfailure times. 

5.1.1 Nelson- Aalen plot. The plot of Figure 6 is a 
special case of the Nelson-Aalen plot to be described 
next. Assume that m systems are observed, with 
the individual failure processes being independent 
and identically distributed. Suppose further that the 
ith process is observed on the time interval (0, Tj] 
and let y(t) denote the number of processes under 
observation at time t. Note that y(t) is a function 
of the Tj and not of the failure times. Let T\~ denote 
the fcth arrival time in the superposed process, that 
is, Ti is a failure time in one of the processes and 
< T\ < T% < ■ • ■ < Tjy < r, where r = maxjrj : i = 
1, . . . , m}. Define the cumulative mean function of a 
single process to be M(t) = E(N(t)). The Nelson- 
Aalen estimator of M{t) is given by 



M(t) = £ 



1 



y(T k y 
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where the sum is taken over all failure times be- 
fore or at time t. Figure 7 shows the plot of M(t) 
for the data on times of valve-seat replacements in 
a fleet of m = 41 diesel engines, taken from [53]. 
The plot indicates that the replacement frequency 
is fairly constant up to 550 days and then increases 
as revealed from the convex shape of the curve at 
the right end. 

The plot as defined here is studied, for example, in 
[53] and [42]. These papers also derive robust non- 
parametric estimates of the variance of M(t), valid 
under any distributional properties of the individual 
processes N(t). 

5.1.2 TTT plot. Consider the special case of the 
above where the m processes are independent NHPPs 
with a common intensity function A(t). The super- 
posed process is now a NHPP with intensity func- 
tion cf)(t) = X(t)y(t), and hence (see Section 3.4) the 
process J Tl 4>(u) du, J^ 2 cj>(u) du, ... is HPP(l) on (0, r) 
Define the total time on test (TTT) at time t by 

r(t) = / y(u) du. 
Jo 

Barlow and Davis [6] introduced the TTT plot for 
repairable systems data as a plot of the points 



j_ r(Tj) 
N 1 r(r) 



N. 



The idea is that if A(i) is a constant, so that the 
processes are HPP, then the r(Ti)/r(r),i = 1, . . . , N, 



form a HPP(l) on [0,1]. In this case the TTT plot 
is by its definition expected to be located near the 
main diagonal of the unit square. Under the alterna- 
tives of decreasing, increasing and bathtub-shaped 
intensity \(t), on the other hand, the TTT plots 
appear to be, respectively, convex, concave and S- 
shaped. Figure 8 shows the TTT plot of the valve- 
seat replacement data of Nelson [53]. The plot ap- 
pears to be fairly straight, but with a slightly con- 
cave shape near the end corresponding to the in- 
creasing intensity here as revealed by the Nelson- 
Aalen plot in Figure 7. 

5.2 Statistical Trend Tests 

Statistical trend tests for repairable systems data 
were extensively discussed by Ascher and Feingold 
[5], Chapter 5B. A trend test is a statistical test 
for the null hypothesis that the failure process is 
stationary, in some sense to be made precise, versus 
alternatives that depend on the kind of trend one 
would like to detect. Here we give main attention 
to the null hypothesis that the process is a HPP or 
more generally a RP. However, as will be discussed 
below, some care should be taken when determining 
the relevant null hypothesis. 

The null hypothesis of HPP is the most common 
and often the most useful in reliability applications. 
The corresponding null property, under the name 
"randomness," was studied in several papers in the 
1950s, and various tests for randomness in time were 




300 400 
Time 

Fig. 7. Nelson-Aalen plot of the estimated cumulative mean function M{t) for the valve-seat replacement data as given by 
Nelson [53]. 
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devised. Here randomness pertained to the prop- 
erty that counts in given time intervals are Poisson- 
distributed. Maguire, Pearson and Wynn [50], how- 
ever, discussed the advantages of using interevent 
times rather than counts to test for changes with 
time of the occurrence rate of events. Cox [17] stated 
eight different kinds of possible alternatives to ran- 
domness, one of them being trend in the sense that 
the conditional intensity is a smooth function of 
time. 

5.2.1 Tests of the null hypothesis of HPP. Sin- 
gle process. Suppose first that the null hypothesis 
is "the process is a HPP," with the alternative be- 
ing a NHPP with monotone intensity. Two classical 
trend tests for this case are the Laplace test and the 
Military Handbook test (see, e.g., [5], page 79). To 
see how they are obtained, consider a single system 
observed on [0, r] . If the failure process is a HPP, 
then given N(r) = re, the failure times T\,T2, . . . , T n 
are distributed as the ordering of n i.i.d. uniform 
random variables on [0,r]. Equivalently, the Tj/r 
(i = l,...,re) are distributed as ordered i.i.d. uni- 
forms on [0, 1] conditionally given N(t) = n. From 
this we can in principle obtain trend tests from any 
test for detecting deviations from a uniform sample. 
The Laplace test statistic is simply a normalization 
of J2i=i Ti j while the Military Handbook test statis- 
tic is similarly a normalization of X^=il°g^i- The 
Laplace test and the Military Handbook test are op- 
timal tests against the alternatives of NHPPs with, 
respectively, log linear intensity and power intensity 
functions ([5], page 79). 



Several processes. As in Section 5.1.2, assume that 
m independent NHPPs with a common intensity 
function X(t) are observed, where the ith process 
is observed on the time interval (0, Tj]. Recall that, 
under the null hypothesis that A(t) is a constant, 
the r(Ti)/r(r), i = l,...,N, form a HPP(l) on [0,1]. 
Kval0y and Lindqvist [35] suggested from this that 
formal trend tests could be defined by substituting 
the r(Tj)/r(r) into the Laplace and Military Hand- 
book test statistics. While these TTT-based tests 
are powerful against monotone alternatives, the au- 
thors suggested using a test statistic based on the 
Anderson-Darling statistic as a general test with 
power against several kinds of trend. 

For many applications, the null hypothesis needs 
to be weakened to state that each process is a HPP, 
but that intensities may differ from system to sys- 
tem. For example, in the data of Proschan [58], one 
may be interested in a simultaneous trend test for 
the systems, allowing there to be heterogeneities be- 
tween them. Kval0y and Lindqvist [35] suggested 
tests for this case called combined tests. A precise 
setting for these tests was recently defined by Kvist, 
Andersen and Kessing [37] , who considered a model 
where the conditional intensity function for a partic- 
ular system is given by ag(Z)X(t), where o is an un- 
observable frailty variable as considered in Section 
4.1, Z is a fixed-time covariate vector observed for 
each system, g is a parametric regression function 
and X(t) is a baseline intensity function. Suppose 
that such a process is observed on the time interval 
[0,t] with events at times Ti, T2, . . . , T N ^ . Then, 
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Fig. 8. TTT plot of valve-seat data as given in [53]. 



ANALYSIS OF REPAIRABLE SYSTEMS 



15 



conditional on (a, Z, r, N(r)), the T±/t, T2/T, . . . , 
^7V(t)/ t are distributed as N(t) ordered standard 
uniform variables on [0, 1] if X(t) is constant. We 
are hence back to the setting of the beginning of this 
subsection. In practice one observes m independent 
processes of this kind, with a common X(t), with the 
a being i.i.d. unobservable random variables and the 
Z being observed covariate vectors for each system. 
The above-mentioned combined tests by Kvafoy and 
Lindqvist [35] can thus be used to test the null hy- 
pothesis that X(t) does not depend on t. Kvist, An- 
dersen and Kessing [37] applied the Laplace type 
test of this kind on data from the Danish register 
on psychiatric hospital admissions. 

5.2.2 Tests of the null hypothesis of RP. The La- 
place test and the Military Handbook test are tests 
for the null hypothesis that the data come from 
HPPs. Thus rejection of the null hypothesis means 
merely that the process is not a HPP. It could still, 
however, be a RP and thus still have "no trend." 
Lawless and Thiagarajah [43] and Elvebakk [25] con- 
cluded from simulations that the Laplace and Mil- 
itary Handbook tests in fact may be seriously mis- 
leading when used to detect trend departures from 
general renewal processes. Similarly, Lewis and Robin 
son [44] noted that these tests are not able to dis- 
criminate properly between trends in the data and 
the appearance of sequences of very long intervals. 

To test the null hypothesis of RP, Lewis and 
Robinson [44] suggested modifying the Laplace test 
by dividing the test statistic by an estimate of the 
coefficient of variation of the interfailure times un- 
der the null hypothesis of a RP. This test, called the 
Lewis-Robinson test, is thus a simple modification 
of the Laplace test. Another classical trend test for 
the null hypothesis of RP is the rank test developed 
by Mann [51] and used by Proschan [58] (see Section 
4). 

Kvafoy and Lindqvist [36] presented a general class 
of tests for renewal process versus both monotonic 
and nonmonotonic trend for which the Lewis-Robinson 
and a useful Anderson-Darling type test are special 
cases. 

Elvebakk [25] demonstrated how tests for the null 
hypothesis of RP can be obtained from tests for the 
Poisson case by adjusting their critical values by re- 
sampling failure data under the RP hypothesis. The 
general conclusion of Elvebakk [25] was to recom- 
mend the use of such resampled trend tests when- 
ever it is not clear that the failure processes are of 



Poisson type. In particular he showed in a simula- 
tion study that the resampled tests are usually fa- 
vorable to the Lewis-Robinson test, and that they 
do not lose much power under NHPP alternatives 
when compared to the standard tests. 

5.2.3 Tests of the null hypothesis of stationary in- 
terfailure times. Lewis and Robinson [44] presented 
a test for distinguishing between a general station- 
ary sequence of interfailure times X{ and a mono- 
tonic trend in interfailure times. Elvebakk [25] ex- 
tended the resampling trend testing approach de- 
scribed in the previous subsection, to cover the case 
when "no trend" corresponds to stationary interfail- 
ure times. The idea is to resample data under this 
new null hypothesis assumption. Elvebakk did this 
both by a parametric approach assuming an under- 
lying autoregressive model and by employing a block 
bootstrap technique adapted from Hall [28]. Simu- 
lations indicated rather satisfactory performance of 
the method. 

5.2.4 Trend tests obtained as likelihood ratio tests. 
In parametric models which include separate param- 
eters for trend, trend tests may be performed as like- 
lihood ratio tests that involve these parameters. An 
example is to test the null hypothesis f3\ = in (5) 
which was suggested in [43] . Trend tests can also be 
obtained in models of the form HTRP(i ? , X(-),H) by 
testing the null hypothesis that A(-) = A using likeli- 
hood ratio tests. Note that this leads to tests of the 
null hypothesis that the processes are all renewal 
processes with a possibility of heterogeneity. 

A nonparametric likelihood ratio test for the null 
hypothesis of a HPP versus the alternative of a NHPP 
with monotone intensity A(-) was derived by Boswell 
[11]. A generalization to the null hypothesis of RP 
can be obtained using the nonparametric monotone 
estimator of A(-) in the TRP model derived by Heg- 
gland and Lindqvist [29]. 

REPAIRABLE SYSTEMS WITH SEVERAL 
TYPES OF EVENTS 

In this section we consider the general marked 
event process described in Section 2. The purpose 
is to show how new classes of maintenance and re- 
pair models can be obtained by generalizing the ap- 
proach of the imperfect repair models for single type 
events considered in Section 3.2. To simplify the pre- 
sentation we shall not allow covariates or hetero- 
geneity in the models considered here. 
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As in Section 3.2, we consider first a nonrepair able 
unit. Assume that this unit may fail due to one of 
several causes or may be stopped for PM before it 
fails, in which case failure is prevented. 

We can formally think of this as having a system 
with, say, n components, denoted {C\, C2, ■ ■ ■ , C n }, 
where a unique failing component can be identified 
at failures of the system and where PM, if appli- 
cable, is represented by one of these components 
so as to simplify notation. Let Wj be the poten- 
tial failure time due to failure of component Cj, 
j = 1, 2, . . . , n. What is observed is the failure time 
T = min(W\, . . . , W n ) and the identity of the failing 
component, say J = j if the component Cj fails. This 
determines a competing risks situation with n com- 
peting risks and with the observed outcome (T, J) 
([20], Chapter 3). The joint distribution of (T, J) 
is thus identifiable from data, as are the so-called 
type-specific hazards defined by 



(14) h 3 (t) 



lim 

At|o 



Pr(t < T < t + At, J = j\T > t) 
At ' 



However, neither the joint nor the marginal distribu- 
tions of the individual potential failure times W\ , . . . , 
W n are identifiable in general from observation of 
(T, J) only. This follows from the so-called Cox- 
Tsiatis impasse; see [20], Chapter 7. On the other 
hand, these marginal and joint distributions are in- 
deed of interest in reliability applications, for exam- 
ple, in connection with maintenance optimization. 
An example is given in the next paragraph. 

Consider the setup of Cooke [15, 16] that involves 
a competing risks situation with a potential failure 
of a unit at some time W\ and a potential action 
of preventive maintenance to be performed at time 
W2. Thus n = 2, while C\ corresponds to failure 
of the unit ( J = 1) and C2 (J = 2) corresponds to 
the action of PM. Knowing the marginal distribu- 
tion of W\ would be particularly important since 
it is the basic failure time distribution of the unit 
when there is no PM. However, as noted above, the 
marginal distributions of W\ and W2 are not iden- 
tifiable unless specific assumptions are made on the 
dependence between W\ and Wi- The most com- 
mon assumption of this kind is that W\ and W2 
are independent, in which case identifiability follows 
([61]; [20], Chapter 7). However, this assumption is 
unreasonable in the present application, since the 
maintenance crew is likely to have some information 
regarding the unit's state during operation. This in- 
sight is used to perform maintenance so as to avoid 



a failure. Thus we are in practice faced with a situa- 
tion of dependent competing risks between W\ and 
W2, and hence identifiability of marginal distribu- 
tions requires additional assumptions. 

Lindqvist, St0ve and Langseth [49] suggested a 
model called the repair alert model to describe the 
joint behavior of the failure time W\ and time W2 
of PM. This model is a special case of random signs 
censoring [15, 16] under which the marginal distri- 
bution of W\ is always identifiable. Recall that W2 
is said to be a random signs censoring of W\ if the 
event {W2 < Wi} is stochastically independent of 
W\, that is, if the event of having a PM before fail- 
ure is not influenced by the time W\ at which the 
system fails or would have failed without PM. The 
idea is that the system emits some kind of signal be- 
fore failure and that this signal is discovered with a 
probability which does not depend on the age of the 
system. The repair alert model extends this idea by 
introducing a so-called repair alert function which 
describes the "alertness" of the maintenance crew 
as a function of time. 

Another possibility to obtain identifiability of the 
distributions of W\ and W2 would be to use the re- 
sult of Zheng and Klein [64] , which shows identifia- 
bility of marginal distributions when the dependence 
is given by a known copula. 

Return now to the general case. Suppose that the 
system is repaired after failure and then put into 
operation, then may fail again and so on. This leads 
to a marked event process as described in Section 2 
with marks in J = {1, 2, . . . , n}, so that the mark at 
each event time is the number of the failing compo- 
nent (or more generally the type of event). 

The properties of this process depend on the re- 
pair strategy. Several classes of interesting models 
can be described in terms of a generalization of the 
virtual age concept introduced in Section 3.2, as dis- 
cussed in the next subsection. 

6.1 Virtual Age Models with Several Types of 
Events 

Recall from Section 3.2 that the class of virtual 
age models generalizes the perfect repair and mini- 
mal repair models, and that the approach more gen- 
erally leads to a large class of models. The main in- 
puts hazard function z(-), which is thought 
of as the hazard function of a new unit, and a vir- 
tual age process which is a stochastic process which 
depends on the actual repair actions performed. 
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Several generalizations of the standard imperfect 
repair models are found in the literature. Shaked 
and Shanthikumar [60] suggested a multicomponent 
imperfect repair model with components that have 
dependent life-lengths. Langseth and Lindqvist [38] 
suggested a model which involves imperfect mainte- 
nance and repair in the case of several components 
and several failure causes. In a recent paper, Doyen 
and Gaudoin [24] developed the ideas further by pre- 
senting a general point process framework for mod- 
eling imperfect repair by a competing risks situation 
between failure and PM. Bedford and Lindqvist [8] 
considered a series system of n repairable compo- 
nents where only the failing component is repaired 
at failures. 

Inspired by the mentioned approaches, we suggest 
in this section a generalization of the imperfect re- 
pair models to the case where there is more than 
one type of event and where the virtual age process 
is multidimensional. 

We let the first part of a virtual age model for 
n components be given by a vector process A(t) = 
(Ai(t) , . . . , A n (t)) that contains the virtual ages of 
the n components at time t. The crucial assumption 
is that A(t) = (Ai(t), . . . ,A n (t)) £ J~t-, which means 
that the component ages are functions of the history 
up to time t. 

As for the case with n = 1 in Section 3.2, it is 
assumed that the Aj(t) increase linearly with time 
between events, and may jump only at event times. 
We define Vj(i) to be the virtual age of component 
j immediately after the ith event. The virtual age 
process for component j is therefore defined by 

A j (t)=v j (N(t-)) + t-T N{t _ ) . 

The second part of a virtual age model in the case 
n = 1 consists of the hazard function z(-). For gen- 
eral n we replace this by functions Vj(vi, . . . ,v n ) for 
vx, V2, ■ ■ ■ ,v n > 0, such that the conditional intensity 
of type j events, given the history jF t „, is 

lj (t) = u j (A 1 (t),...,A n (t)). 

Thus Vj(vx, . . . , v n ) is the intensity of an event of 
type j when the component ages are v±, . . . , v n , re- 
spectively. The conditional intensity thus depends 
on the history only through the virtual ages of the 
components. 

The family {vj(v\, . . . , v n ) :v\, v 2 , ■ ■ ■ ,v n > 0} de- 
scribes the failure mechanisms of the components 
and the dependence between them in terms of the 
ages of all the components. The basic statistical 



inference problem therefore consists of estimating 
these functions from field data. The case n = 1 has 
already been discussed in Section 3.2, but we shall 
see that identifiability problems can occur when n > 
1. 

6.2 Repair Models and their Virtual Age 
Processes 

Most of the virtual age processes considered for 
the case n = 1 can be generalized to the present 
case of several event types. There are, however, of- 
ten several ways to do this. Some examples are given 
below. Additional examples include generalizations 
of Kijima's [34] models, which may be plausible in 
applications. 

6.2.1 Perfect repair of complete system. Suppose 
that all the components are repaired to as good as 
new at each failure of the system. In this case we 
have Vj{i) = for all j and i, and hence Aj(t) = 
t — T N n-\ for all j. It follows that we can only iden- 
tify the "diagonal" values Vj(t, ■ ■ ■ , t) of the functions 
Uj. As noted in Section 6.3, these are given by the 
type-specific hazards defined in (14) for the nonre- 
pairable competing risks case. This is not surprising 
in view of the fact that the present case of perfect 
repair essentially corresponds to observation of i.i.d. 
realizations of the nonrepairable competing risks sit- 
uation. 

6.2.2 Minimal repair of complete system. In the 
given setting a minimal repair will mean that fol- 
lowing an event, the process is restarted in the same 
state as was experienced immediately before the event. 
In mathematical terms, this implies that Vj{i) = Ti 
for all i,j and hence that Aj{t) = t for all j. Note 
that the complete set of functions Vj is again not 
identifiable. Moreover, for a single component it is 
well known that minimal repair results in a failure 
time process which is a NHPP. In the present case of 
several components which are minimally repaired, it 
follows similarly that the failure processes of the in- 
dividual components are independent NHPPs with 
the intensity for component j given by vj(t, . . . ,t), 
which as already noted equals the type-specific haz- 
ard (14). 

6.2.3 A partial repair model. Bedford and Lindqvist 
[8] suggested a partial repair model for the n com- 
ponent case. The virtual age process is defined by 
letting Aj{t) = time since last event of type j. Equiv- 
alently, the process could be defined by 
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Thus, the age of the failing component is reset to 
at failures, whereas the ages of the other compo- 
nents are unchanged. The authors considered a sin- 
gle realization of the process, with the main result 
being that under reasonable conditions pertaining 
to ergodicity, the functions Vj(v\, . . . ,v n ) are identi- 
fiable. The intuitive idea of their proof is that the 
ages vi,...,v n will mix in such a manner that the 
complete set of fj(vi, . . . ,v n ) can be identified. 

6.2.4 Age reduction models. Doyen and Gaudoin 
[24] considered a single component or system and 
two types of events: C\ = failure, C2 = PM. In their 
basic model the virtual ages of the two types of 
events are equal: A\(t) = Azit) = A(t). They indi- 
cated, however, that this restriction is not necessary. 
Various choices of virtual age processes were con- 
sidered. In particular they considered age reduction 
models that generalize these mentioned at the end 
of Section 3.2. More precisely, assume that there are 
given age reduction factors < p\ , pi < 1 for the two 
types of events. The virtual age immediately after 
the ith repair is then 

v(i) = (l-p Ji )(v(i-l)+X i ), 

which means that the virtual age immediately be- 
fore the ith failure, v(i — 1) + Xj, is reduced due to 
repair by the factor 1 — pj i . Alternatively, if only the 
additional age Xj is reduced by the repair, it could 
be assumed that v(i) = v(i — 1) + (1 — pj^Xi. 

6.3 Modeling the Intensity Functions Uj 

In principle the functions Vj(vi, . . . , v n ) could be 
any functions of the component ages. Bedford and 
Lindqvist [8] motivated these functions by writing, 
for j= l,...,n, 

(15) Vj(vi,...,v n ) =Xj(vj) +A J -*(v 1 ,...,u„) 

with the convention that \j*(v\, . . . ,v n ) = when 
all the component ages except the jth are 0, so as 
to have uniqueness. Then Xj(vj) is thought of as the 
intensity of component j when working alone or to- 
gether with only new components, while \j*(vi, . . . , 
v n ) is the additional failure intensity imposed on 
component j caused by the other components when 
they are not all new. Note that any functions of 
v±,...,v n can be represented this way, by allowing 
the Aj* to be negative as well as positive. 

Langseth and Lindqvist [38] and Doyen and 
Gaudoin [24] extended the competing risks situa- 
tion between failure and PM, as described at the 



beginning of the present section, and suggested how 
to define suitable functions Vj. The main ideas of 
these approaches can be described for general n as 
follows. Starting from a state where the component 
ages are, respectively, v\, . . . , v n , let the time to next 
event be governed by the competing risks situation 
between the random variables W*, ■ ■ ■ , W* with dis- 
tribution equal to the conditional distribution of 
W\ - vi, . . . , W n - v n given W\ > v x , . . . , W n > v n , 
where the Wi are defined in the nonrepairable case 
described at the beginning of the section. It is then 
rather straightforward to show that this implies 



(16) Vj(vi,...,v n ) 



-djRjvi, . . . ,v n ) 
R(vi,...,v n ) 



where R{v\, ...,v n ) = P{W\ > vi, . . . ,W n > v n ) is 
the joint survival function of the Wj, and dj means 
the partial derivative with respect to the jth entry in 
R. Note that this generalizes the usual hazard rate 
in the case n = 1 considered in Section 3.2. Further, 
we have Vj(t, t, . . . , t) = hj(t), where the latter is the 
type-specific hazard rate given in (14). 

A final remark on the suggested construction of 
the functions Vj is due. It was demonstrated by Bed- 
ford and Lindqvist [8] that, even in the case with 
n = 2, it is not always possible to derive a general 
set of functions Vj{y\, . . . , v n ) from a single joint sur- 
vival distribution as in (16). A simple counterexam- 
ple was given in [8]. Thus for generality one should 
stick to completely general representations like (15). 

7. CONCLUDING REMARKS 

In the present paper we have reviewed some main 
approaches for the analysis of data from repairable 
systems. To a large extent the emphasis has been on 
describing the underlying principles and structures 
of common models. Essential features of such mod- 
els correspond to the three dimensions of the model 
cube in Figure 3: renewal property, time trend and 
heterogeneity. The presentation places less empha- 
sis on statistical inference than on modeling. How- 
ever, it has been an intention to show how likelihood 
functions are obtained for the different models. It is 
also indicated how covariates can be included in the 
models and the corresponding likelihood functions. 
While the derived likelihood functions can be used in 
a rather straightforward manner in parametric sta- 
tistical inference, there turn out to be several chal- 
lenging problems connected to nonparametric esti- 
mation in some of the models. 
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Two main types of models with rather simple and 
transparent basic structures have been considered. 
These are the virtual age type models and the TRP 
type models. The former type combines two basic 
ingredients: a hazard rate z(-) of a new system to- 
gether with a particular repair strategy which gov- 
erns the virtual age process A(t). The renewal di- 
mension is taken care of by the virtual age process, 
while trend is determined by the distribution of a 
new system. For the TRP(.F, A(-)), the renewal di- 
mension corresponds to the renewal distribution F, 
while the trend is explicitly given by the trend func- 
tion A(-). For both types of processes, heterogeneity 
can be included by multiplicative factors working on 
the intensities. A noticeable difference between the 
two types of models as regards statistical inference is 
that the virtual age type model usually requires that 
the virtual age process be observable. Such observa- 
tions may, however, often be lacking in real data. 

Many processes show some degree of clustering of 
failures. This may be due to various causes; see, for 
example, [17]. Several models have been suggested 
in the literature, a classical one being the Neyman 
and Scott [54] model. As pointed out by a referee, 
even the TRP model can pick up the clustering effect 
by allowing the renewal distribution to be a mixture 
with a substantial amount of probability near zero. 

Peha [55] has reviewed a class of models suggested 
in [56]. These are virtual age models which include 
the possibility of heterogeneity between systems, time- 
dependent covariates, and for which in addition the 
conditional intensities may depend on the number of 
previous events. This last feature adds an interest- 
ing flexibility to the model. In particular it enables 
modeling of certain load sharing processes and soft- 
ware failure processes. 

Certain systems, for example, alarm systems, are 
tested only at fixed times which are usually peri- 
odic. If the system is found in a failed state, then 
it is repaired or replaced. Thus repair is not done 
at the same time as the failure, and the situation is 
not covered by the methods considered in the pa- 
per. A simple model of this situation was suggested 
by Hokstad and Fr0vig [30] and further studied and 
extended by Lindqvist and Amundrustad [47]. Con- 
sider a system which starts operation at time t = 

and is tested at time epochs r, 2r, 3r, When time 

is running between testing epochs, the state of the 
system is modeled by an absorbing Markov chain. 
Having thus defined the probabilistic behavior of the 
system state between testing, one needs to add to 



the model a specification of the repair policy. In [47] 
this is modeled in the form of a transition matrix on 
the state space of the Markov chain, which defines 
the possible changes of state and their probabilities 
following the repair actions. 

In a given study there is usually a choice between 
several types of models. It is thus important to have 
tools for model checking and goodness-of-fit pro- 
cedures. For model checking in parametric estima- 
tion of the HTRP model, we refer to [48], which 
used a type of Cox-Snell residuals together with 
plots using the TTT technique. The general un- 
derlying idea, which in principle can be used with 
all estimation methods considered in this paper, is 
that the process of integrated conditional intensities, 
fPlWdtJ^iWdt,..., is HPP(l) [12]. In turn 
this gives rise to computable residual processes when 
estimates are inserted for parameters and distribu- 
tions. The use of these processes in model checking 
is demonstrated for three different data sets in [48]. 
Typically, one would check (i) the distribution of 
the residuals with respect to departures from the 
unit exponential distribution, (ii) the possible pres- 
ence of time trends in residuals within each system 
and (iii) the possible presence of autocorrelation in 
times between events in the residual processes. 
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